Multi-modal, multi-technique vehicle signal detection

ABSTRACT

A vehicle includes one or more cameras that capture a plurality of two-dimensional images of a three-dimensional object. A light detector and/or a semantic classifier search within those images for lights of the three-dimensional object. A vehicle signal detection module fuses information from the light detector and/or the semantic classifier to produce a semantic meaning for the lights. The vehicle can be controlled based on the semantic meaning. Further, the vehicle can include a depth sensor and an object projector. The object projector can determine regions of interest within the two-dimensional images, based on the depth sensor. The light detector and/or the semantic classifier can use these regions of interest to efficiently perform the search for the lights.

PRIORITY DATA

This application is a Continuation Application of and claims priority toU.S. patent application Ser. No. 16/804,667, titled MULTI-MODAL,MULTI-TECHNIQUE VEHICLE SIGNAL DETECTION, filed on Feb. 28, 2020, whichis a Continuation Application of and claims priority to U.S. patentapplication Ser. No. 16/803,829, titled MULTI-MODAL, MULTI-TECHNIQUEVEHICLE SIGNAL DETECTION, filed on Feb. 27, 2020. Both of these U.S.patent applications are incorporated by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates generally to detecting vehicle signals inimages from one or more cameras mounted on a vehicle.

BACKGROUND

Some vehicles can sense the environment in which they drive, using acamera, and sensors, such as Light Detection and Ranging (LIDAR) andradar. The camera can capture images of objects in the environment. TheLIDAR and radar sensors can detect depths of the objects in theenvironment.

Some vehicles are autonomous vehicles (AVs) that drive in theirenvironment independently of human supervision. These AVs can includesystems that detect signal lights of a vehicle within images from thecamera. In these systems, an entirety of an image obtained from thecamera is searched for the signal lights. This search is computationallyintensive and expensive, which makes the search slow and verytime-consuming for a vehicle traveling at road speeds.

BRIEF DESCRIPTION OF THE DRAWINGS

To provide a more complete understanding of the present disclosure andfeatures and advantages thereof, reference is made to the followingdescription, taken in conjunction with the accompanying figures, whereinlike reference numerals represent like parts, in which:

FIG. 1 is a block diagram illustrating a multi-modal vehicle signaldetector in accordance with various implementations of the presentdisclosure;

FIG. 2 is a block diagram illustrating a multi-technique vehicle signaldetector in accordance with various implementations of the presentdisclosure;

FIG. 3 is a functional block diagram illustrating an autonomous vehiclehaving multi-modal and/or multi-technique vehicle signal detection, inaccordance with various implementations of the present disclosure;

FIG. 4 is a functional block diagram illustrating an autonomous drivingsystem (ADS) associated with an autonomous vehicle, in accordance withvarious implementations of the present disclosure;

FIG. 5 illustrates a workflow for multi-modal and/or multi-techniquevehicle signal detection in accordance with various implementations ofthe present disclosure;

FIG. 6 illustrates a vehicle represented as a three-dimensional box anda mapping of the vehicle's lights;

FIG. 7 illustrates a situation in which a left side of athree-dimensional object and a right side of the three-dimensionalobject are captured in independent images;

FIG. 8 illustrates interpretation of detection results from a pluralityof vehicle signal detectors in accordance with various implementationsof the present disclosure;

FIG. 9 illustrates a method for multi-modal object projection inaccordance with various implementations of the present disclosure;

FIG. 10 illustrates a method for multi-technique vehicle signaldetection in accordance with various implementations of the presentdisclosure; and

FIG. 11 illustrates components of a computing system used in variousimplementations of the present disclosure.

DETAILED DESCRIPTION

Introduction

A depth sensor and/or camera mounted on a body of a vehicle can detect aposition of an object in an environment surrounding the vehicle. Anobject tracker, performing object detection and tracking based oninformation from depth sensor and/or camera, can represent an object asa polygon or box. An object tracker may also identify objects asvehicles. The object trackers can determine and provide informationrelating to a position of an object, orientation of the object, and thesize of the object (e.g., represented as a three-dimensional box). Theinformation can be provided by the object tracker in the coordinatesystem of the object tracker. The detected position of the object can beused to identify a subsection of an image from the camera that includesthe object. Thus, a search for a vehicle signal on the object can belimited to the subsection of the image. The subsection of the image isreferred herein as region of interest. By limiting the search to aregion of interest, the search for vehicle signals can be moreefficient. Moreover, the search for vehicle signals can be more robustwhen performed with information from the object tracker, since there isa higher degree of certainty that the identified vehicle signals areindeed signals on a vehicle, and not on other types of objects such asstreet lights and traffic lights. A search can be made more efficient byfocusing the search on the front and rear faces of the vehicle or on theouter portions of the sides of the vehicles. In some cases, the searchcan leverage information from the object tracker and information aboutwhere the vehicle signals are likely to be located to optimize thesearch for vehicle signals in the image from the camera.

In some embodiments of the present disclosure, a system maintainsinformation that relates pixels of images captured by a camera tocoordinate system of the object tracker. This information can berepresented as a geometric matrix transformation that can transforminformation in the coordinate system of the object tracker to thecoordinate system of the camera. Specifically, the geometric matrixtransformation can transform information in the coordinate system of theobject tracker to pixels of an image captured by the camera. The systemreceives, from the object tracker, three-dimensional coordinates of avehicle in an environment surrounding the camera. The object tracker ofthe system can represent the vehicle as a three-dimensional box, basedon information such as the three-dimensional coordinates of the object.Through a geometric matrix transformation, the system can project thethree-dimensional coordinates of the box onto the two-dimensional imagecaptured by the camera. In some cases, the system can project a face ofthe three-dimensional box (e.g., a vehicle side) onto thetwo-dimensional image captured by the camera. The system can thendetermine a region of interest (ROI) for the vehicle in an imagecaptured by the camera, based on the projection, and optionally otherinformation from the object tracker. The system can then search forlights of the vehicle within the ROI.

In implementations in which the box has unequal faces (as distinguishedfrom the equal faces of a cube), an object tracker of the system canmaintain a sense of the smaller length of the front and rear faces ofthe vehicle and the longer length of the sides of the vehicle. Thus, thesystem, leveraging information from the object tracker, can perform amore efficient and more robust search for lights. Having a boxrepresentation for vehicles and identifying regions of interest tosearch for lights, mean that other sources of lights which are notvehicle signal lights (e.g., street lights, traffic lights, reflectionsof vehicle signal lights, etc.) can be excluded. The system caneffectively reduce false positives triggered by the other sources oflights which are not vehicle signal lights.

In addition, issues can arise if only one image is searched for avehicle signal. For example, if a portion of the object is temporarilyoccluded by another object, then an image from the camera will notinclude a light within the occluded portion of the object. Similarly, ifa portion of the object is not within the field-of-view of the camera,then an image from the camera will not include any light outside thefield-of-view of a camera. Other issues can arise due to the exposuresetting of a given camera, in the absence of any type of occlusionissues. A vehicle signal may be visible from one camera having a fixedexposure, but the same vehicle signal may not be detectable from anothercamera having automatic exposure, or vice versa.

In some embodiments, a system in one implementation of the presentdisclosure includes multiple cameras that capture two-dimensional imagesof an object from different positions and detect one or more lights ofthe object within the images. Accordingly, the system can detect a lightof the object in an image from one camera, even if the light isoccluded, or not present, in an image from another camera.

Further, in one situation, a camera might capture an image of just theleft light of an object, and another camera might capture an image ofjust the right light of the object. In this situation, fusinginformation of the two images can provide additional information thatcannot be derived from just one of the images alone. For example, thesystem can semantically interpret an image of just the left light or animage of just the right light as indicating the possibility that a turnsignal is ON or a hazard light is ON. By fusing information of the twoimages, the system can semantically determine that both lights are ONand, therefore, the hazard lights are ON and the turn signals are OFF.Thus, the system can more accurately control an autonomous vehicle basedon the clearer semantic meaning.

In some embodiments, a system in one implementation of the presentdisclosure includes multiple cameras, with different exposure settings,which capture two-dimensional images of an object. Based on the imagesfrom multiple cameras, the system can detect one or more lights of theobject within the images. Accordingly, the system can detect a light ofthe object in an image from one camera, even if the light is notdetectable in an image from another camera.

In some implementations, different techniques for detecting vehiclelights can be applied to one or more images, possibly by the same, or bydifferent cameras. Interpretation of vehicle signal detection results(e.g., semantic meanings) from the different techniques can improverobustness, and be more accurate than a system involving a singletechnique.

Overview of Selected Features

FIG. 1 is a block diagram illustrating a multi-modal vehicle signaldetector in accordance with an implementation. The system of FIG. 1includes one or more image sensor(s) 110, one or more depth sensor(s)120, a tracker 130, optional one or more other sensor(s) 135, themulti-modal vehicle signal detector 140, an object projector 150, and avehicle signal detector 160. The multi-modal vehicle signal detector 140can be included in a vehicle, such as the AV discussed with regard toFIG. 3.

The one or more image sensor(s) 110 can include one or more camera(s)mounted on the vehicle. The one or more image sensor(s) 110 capturetwo-dimensional images of an environment through which the vehicledrives. The one or more image sensor(s) 110 output one or more of thetwo-dimensional images to multi-modal vehicle signal detector 140.

The one or more depth sensor(s) 120 are or include one or more LIDARsensor(s) and/or one or more radar sensors mounted on the vehicle. Theone or more depth sensor(s) 120 determine distances between the one ormore depth sensor(s) 120 and objects in the environment through whichthe vehicle drives. The one or more depth sensor(s) 120 output to thetracker 130 depth maps indicating the depths from the one or more depthsensor(s) 120 to locations of the objects in the environment. Thus, thetracker 130 can produce three-dimensional coordinates of the objects inthe environment.

The optional one or more other sensor(s) 135 can include a compass,speedometer, a global positioning system (GPS) sensor, an accelerometer,and/or any other sensor. The optional other sensor(s) 135 outputinformation to tracker 130. Namely, the compass can output whichdirection the vehicle is facing (e.g., a heading). The speedometer canoutput a speed of the vehicle. Thus, the compass and speedometer cancollectively output a velocity. The GPS sensor can output a currentlocation of the vehicle. The accelerometer can output an acceleration ofthe vehicle. The optional one or more other sensor(s) 135 can alsoinclude sensors used in vehicle-to-vehicle communication.

The tracker 130 receives the depth maps from the one or more depthsensor(s) 120. The tracker 130 can also receive information (e.g., aheading, a speed, a velocity, a location, or an acceleration of thevehicle) from the optional one or more other sensor(s) 135. In somecases, the tracker 130 can receive images captured by the one or moreimage sensor(s) 110. The tracker 130 can output position information,orientation information, and size information associated with one ormore objects in the environment. The tracker 130 can also output aclassification of an object, such as identification that the object is avehicle, or a specific type of vehicle. The tracker 130 can assign aunique identifier (e.g., a number) to each of the objects in theenvironment. The tracker 130 can track the distance of each object overtime as the vehicle travels through the environment, and can take intoaccount the velocity or acceleration of the vehicle if desired. Thetracker 130 can determine the position, the velocity, and theacceleration of the objects in the environment. The tracker 130 can formpolygons of objects in the environment. The tracker 130 can output tothe multi-modal vehicle signal detector 140 the tracked objects, asrepresented by, e.g., their unique identifier, position, polygon,velocity, and/or acceleration. In some implementations, the tracker 130detects objects based on signal processing techniques. In someimplementation, the tracker 130 can detect objects based on artificialintelligence, e.g., by providing depth information and/or images toconvolutional neural networks (“ConvNets”), such as Google TensorFlow.

In an example in which the object is a vehicle (e.g., car or truck), thetracker 130 can represent a front face of the box based on thewindshield, hood, grille, and/or front bumper of the vehicle. Thetracker 130 can represent a rear face of the box based on the rearwindshield, trunk/tailgate, spoiler (where applicable), and/or rearbumper of the vehicle. The tracker 130 can represent a side face of thebox based on the windows, doors, side panels, and/or tires of thevehicle. The tracker 130 can indicate to the object projector 150information relating the three-dimensional box to the specific faces ofa vehicle. In some implementations, the tracker 130 positions the frontface of the box based on the three-dimensional coordinates of the grilleand the rear face of the box based on the three-dimensional coordinatesof the tailgate. The grille and tailgate more closely approximate thelocation of the vehicle's lights than, e.g., the windshield and the rearwindshield, which may be horizontally offset or otherwise slope towardthe position of the vehicle's lights. Similarly, in someimplementations, the tracker 130 positions the side faces of the boxbased on the three-dimensional coordinates of the side panels of thevehicle.

The object projector 150 of the multi-modal vehicle signal detector 140receives the one or more image(s) from the one or more image sensor(s)110 and the tracked objects from the tracker 130. In someimplementations, the object projector 150 also receives timestamps atwhich the two-dimensional images were captured by the one or more imagesensor(s) 110. The information from the tracker 130 provided to theobject projector 150 can include information associated with one or moreobjects (e.g., a vehicle) in the environment, each object represented asa three-dimensional box. The object projector 150 can apply a transform(e.g., geometric matrix transform) to project the three-dimensional boxonto a two-dimensional image.

Accordingly, the object projector 150 can determine a two-dimensionalview of the three-dimensional box within a two-dimensional imagecaptured by the one or more image sensor(s) 110. In some cases, theobject projector 150 can determine a two-dimensional view of a side ofthe three-dimensional box within the two-dimensional image captured bycaptured by the one or more image sensor(s) 110. In addition, becausethe object projector 150 projects multiple objects, the object projector150 can also determine, through occlusion reasoning, an occlusion ofpart, or an entirety of a first one of the objects, due to a second oneof the objects.

In some implementations, the object projector 150 can correct thetwo-dimensional view of the three-dimensional box based on thefield-of-view of the one or more image sensor(s) 110. This correctioncan be performed in view of imperfections, such as aberration, in a lensof the one or more image sensor(s) 110.

The object projector 150, thus, based on the projection, determines aROI in an image from the one or more image sensor(s) 110. The objectprojector 150 can determine the ROI based on the two-dimensional view ofthe three-dimensional box. Thus, the ROI represents where the objectprojector 150 expects the object to be in the image. The objectprojector 150 outputs the ROI as an output. In some implementations, theobject projector 150 can also output a timestamp of the image includingthe ROI.

The vehicle signal detector 160 then efficiently searches for lights ofthe object within the ROI.

FIG. 2 is a block diagram illustrating a multi-technique vehicle signaldetector 240 in accordance with an implementation. The system of FIG. 2includes one or more image sensor(s) 210, one or more depth sensor(s)220, a tracker 230, optional one or more other sensor(s) 235, themulti-technique vehicle signal detector 240, one or more vehicle signaldetector(s), and an interpretation module 275. The one or more vehiclesignal detector(s) can include a first vehicle signal detector 245, anoptional second vehicle signal detector 255, through an optional Nthvehicle signal detector 265. Preferably, the multi-technique vehiclesignal detector 240 includes two or more vehicle signal detectors.

The one or more image sensor(s) 210, one or more depth sensor(s) 220,tracker 230, and optional one or more other sensor(s) 235 arestructurally and functionally similar or identical to the one or moreimage sensor(s) 110, one or more depth sensor(s) 120, tracker 130, andoptional one or more other sensor(s) 135, respectively. Thus, no furtherdescription of the one or more image sensor(s) 210, one or more depthsensor(s) 220, tracker 230, and optional one or more other sensor(s) 235is provided for the purposes of this overview.

The one or more vehicle signal detector(s) receive two-dimensionalimages from the image sensor(s) 210, the tracked objects from tracker230, and, optionally, information from the optional one or moresensor(s) 235. In some implementations, the one or more vehicle signaldetector(s) also receive timestamps at which the two-dimensional imageswere captured by the one or more image sensor(s) 210.

The one or more vehicle signal detector(s) can be one signal detector,such as the first vehicle signal detector 245, and detect signals inimages from all of the one or more image sensor(s) 210. The one or morevehicle signal detector(s) alternatively can include N vehicle signaldetectors, with one of the N vehicle signal detectors detecting signalsin images from one of N image sensors.

In the case where there are a plurality of vehicle signal detectors, thevehicle signal detectors can receive the same inputs but may implementdifferent techniques involving geometric and/or semantic determinations.The vehicle signal detectors may individually receive different inputsbut implement same or similar techniques involving geometric and/orsemantic determinations.

The one or more vehicle signal detector(s) search for lights within thereceived two-dimensional image(s). The search can be based on, forexample, a minimum bounding box of a light in an image, a color of thelight in the image, and an average intensity of pixels of the light inthe image.

After the one or more vehicle signal detector(s) detect the light, theone or more vehicle signal detector(s) make a geometric or semanticdetermination based on the light. In an example of a geometricdetermination, the one or more vehicle signal detector(s) determine aposition of the light on the three-dimensional box, a color of thelight, a shape of the light, and a size of the light. In an example of asemantic determination, the one or more vehicle signal detector(s)determines a meaning of the light, such as “headlight ON.”

The one or more vehicle signal detector(s) output the determinedgeometric (e.g., position, color, shape, size) or semantic (e.g.,headlight ON) information to the interpretation module 275. In someimplementations, the one or more vehicle signal detector(s) also outputthe object identifiers and/or a three-dimensional box or cuberepresenting the object and including representations of the lights.

The interpretation module 275 receives and fuses the determinedinformation from the one or more vehicle signal detector(s) and performsa (final) semantic determination on the signals. The interpretationmodule 275 outputs this semantic determination as a (final) detectionresult. A vehicle including the multi-technique vehicle signal detectorcan be controlled based on this semantic meaning.

According to some embodiments, the interpretation module 275 can fusegeometric or semantic information from a plurality of vehicle signaldetectors operating on different images from a plurality of cameras. Insome cases, the interpretation module 275 can perform such fusion basedon the object identifiers and/or the received three-dimensional boxes orcubes. Thus, even if a light is partially occluded or not present in animage from one camera, the system can perform reasoning to deduce asemantic detection result.

According to some embodiments, the interpretation module 275 can outputa (final) semantic detection result based on outputs generated by aplurality of vehicle signal detectors applying different techniques onthe same image. Thus, even if a vehicle signal light is not detectableby one technique, another vehicle signal detector may detect a vehiclesignal light. In some cases, multiple results from the differenttechniques are fused, e.g., in a voting scheme, to increase therobustness of the multi-technique vehicle signal detector 240.

In some implementations, the multi-modal vehicle signal detector 140 ofFIG. 1 can be combined with the multi-technique vehicle signal detector240 of FIG. 2. For instance, the multi-modal vehicle signal detector 140of FIG. 1 can be used as a vehicle signal detector withinmulti-technique vehicle signal detector 240 of FIG. 2.

Autonomous Vehicle

FIG. 3 illustrates is a functional block diagram illustrating an AVhaving multi-modal and/or multi-technique vehicle signal detection, inaccordance with various implementations.

As depicted in FIG. 3, the autonomous vehicle 10 generally includes achassis 12, a body 14, front wheels 16, and rear wheels 18. The body 14is arranged on the chassis 12 and encloses components of the autonomousvehicle 10. The body 14 and the chassis 12 can jointly form a frame. Thefront wheels 16 and rear wheels 18 are rotationally coupled to thechassis 12 near respective corners of the body 14.

In various implementations, system 300, and/or components thereof, areincorporated into the autonomous vehicle 10. The autonomous vehicle 10is, for example, a vehicle controlled to carry passengers from onelocation to another, independent of human intervention. The autonomousvehicle 10 is depicted in FIG. 3 as a passenger car; any other vehicle,including a motorcycle, truck, sport utility vehicle (SUV), recreationalvehicle (RV), marine vessel, aircraft, and the like, can also be used.

In an example, the autonomous vehicle 10 implements a level four orlevel five automation system under the Society of Automotive Engineers(SAE) “J3016” standard taxonomy of automated driving levels. Using thisterminology, a level four system indicates “high automation,” referringto a driving mode in which the automated driving system performs aspectsof the dynamic driving task, even if a human driver does not respondappropriately to a request to intervene. A level five system indicates“full automation,” referring to a driving mode in which the automateddriving system performs aspects of the dynamic driving task underextreme roadway and environmental conditions managed by a human driver.Implementations are not limited to any taxonomy or rubric of automationcategories. Further, systems in accordance with the present disclosurecan be used in conjunction with any autonomous or other vehicle thatutilizes a navigation system and/or other systems to provide routeguidance and/or route implementation.

As shown, the autonomous vehicle 10 generally includes a propulsionsystem 20, a transmission system 22, a steering system 24, a brakesystem 26, a sensor system 28, an actuator system 30, a data storagedevice 32, at least one controller 34, and a communication system 36.The propulsion system 20 can, in various implementations, include aninternal combustion engine, an electric machine such as a tractionmotor, and/or a fuel cell propulsion system. The transmission system 22transmits power from the propulsion system 20 to the front wheels 16 andthe rear wheels 18 according to selectable speed ratios. According tovarious implementations, the transmission system 22 can include astep-ratio automatic transmission, a continuously-variable transmission,or other appropriate transmission.

The brake system 26 provides braking torque to the front wheels 16and/or the rear wheels 18. Brake system 26 can, in various embodiments,include friction brakes, brake by wire, a regenerative braking systemsuch as an electric machine, and/or other appropriate braking systems.

The steering system 24 influences a position of the front wheels 16and/or the rear wheels 18. While depicted as including a steering wheel25, in some implementations of the present disclosure, the steeringsystem 24 might not include a steering wheel.

The sensor system 28 includes one or more sensing devices 40 a-40 n thatsense observable conditions of the exterior environment and/or theinterior environment of the autonomous vehicle 10. The sensing devices40 a-40 n can include, and are not limited to, radars, LIDARs, a GPSsensor, optical cameras, thermal cameras, ultrasonic sensors, aspeedometer, a compass, an accelerometer, and/or other sensors. Theactuator system 30 includes one or more actuator devices 42 a-42 n thatcontrol one or more vehicle features such as, and not limited to, thepropulsion system 20, the transmission system 22, the steering system24, and the brake system 26. In various embodiments, the autonomousvehicle 10 can also include interior and/or exterior vehicle featuresnot illustrated in FIG. 3, such as various doors, a trunk, and cabinfeatures such as air conditioning, music players, lighting, touch-screendisplay components (such as those used in connection with navigationsystems), and the like.

The data storage device 32 stores data to control the AV 10. In variousembodiments, the data storage device 32 stores defined maps of thenavigable environment. In various embodiments, the defined maps arepredefined by and obtained from a remote system. For example, thedefined maps can be assembled by the remote system and communicated tothe autonomous vehicle 10 (wirelessly and/or in a wired manner) andstored in the data storage device 32. Route information can also bestored within the data storage device 32—i.e., a set of road segments(associated geographically with one or more of the defined maps) thattogether define a route the vehicle can take to travel from a startlocation (e.g., the vehicle's current location) to a target location.Also, in various implementations, the data storage device 32 storesvehicle signal detection software 38 that, when executed by one or moreprocessors, implements methods described herein. As will be appreciated,the data storage device 32 can be part of the controller 34, separatefrom the controller 34, or part of the controller 34 and part of aseparate system.

The controller 34 includes a processor 44 and a computer-readablestorage device or media 46. The processor 44 can be any custom-made orcommercially available processor, a central processing unit (CPU), agraphics processing unit (GPU), an auxiliary processor among severalprocessors associated with the controller 34, a semiconductor-basedmicroprocessor (in the form of a microchip or chip set), any combinationthereof, or generally any device for executing instructions. Theprocessor 44 can include one or more processors 44 to cooperate andexecute instructions. The computer-readable storage device or media 46can include volatile and nonvolatile storage in read-only memory (ROM),random-access memory (RAM), and keep-alive memory (KAM), for example.KAM is a persistent or nonvolatile memory that can store variousoperating variables while the processor 44 is powered down. Thecomputer-readable storage device or media 46 can be implemented usingany of a number of known memory devices such as PROMs (programmableread-only memory), EPROMs (electrically PROM), EEPROMs (electricallyerasable PROM), flash memory, or any other electric, magnetic, optical,resistive, or combination memory devices for storing data, some of whichrepresent executable instructions, used by the controller 34 incontrolling the AV 10.

The instructions can encode/include one or more separate programs thatinclude an ordered listing for implementing logical functions. Theinstructions, when executed by the processor 44, receive and processsignals from the sensor system 28, perform logic, calculations, methodsand/or algorithms for controlling the components of the AV 10, andgenerate control signals transmitted to the actuator system 30 tocontrol the components of the AV 10 based on the logic, calculations,methods, and/or algorithms. Although FIG. 3 shows a controller 34,implementations of the AV 10 can include any number of controllers 34that communicate over any suitable communication medium or a combinationof communication media and that cooperate to process the sensor signals,perform logic, calculations, methods, and/or algorithms, and generatecontrol signals to control features of the AV 10. In one implementation,as discussed in detail below, the controller 34 can perform multi-modaland/or multi-technique vehicle signal detection.

The communication system 36 wirelessly communicates information to andfrom other entities 48, such as, and not limited to, other vehicles(“V2V” communication), infrastructure (“V2I” communication), remotetransportation systems, and/or user devices. In an example, thecommunication system 36 is a wireless communication system thatcommunicates via a wireless local area network (WLAN) using IEEE(Institute of Electrical and Electronics Engineers) 802.11 standards orby using cellular data communication. Additional or alternatecommunication methods, such as a dedicated short-range communications(DSRC) channel, are also within the scope of the present disclosure.DSRC channels refer to one-way or two-way short-range to medium-rangewireless communication channels for automotive use and a correspondingset of protocols and standards.

Vehicle signal detection software 38 includes instructions forperforming a multi-modal and/or multi-technique vehicle signaldetection, as previously illustrated by FIGS. 1 and 2.

The controller 34 also includes artificial intelligence module 50 toperform feature detection/classification, obstruction mitigation, routetraversal, mapping, sensor integration, ground-truth determination, andthe like. The artificial intelligence module 50 can implement techniquessuch as decision trees, probabilistic classifiers, naive Bayesclassifiers, support vector machine, deep learning, a neural network, aconvolutional neural network, a recurrent neural network, randomforests, genetic algorithms, and/or reinforcement learning.

Autonomous Driving System

In accordance with various implementations, the controller 34 implementsthe level four or level five automation system discussed with regard toFIG. 3 as an autonomous driving system (ADS) 70 as shown in FIG. 4.

As shown in FIG. 4, software and/or hardware components of thecontroller 34 (e.g., processor 44 and computer-readable storage deviceor media 46) provide the ADS 70.

The autonomous driving system 70 can include a vehicle signal detectionsystem 74, a positioning system 76, a guidance system 78, and a vehiclecontrol system 80. In various implementations, the system can beorganized into any number of systems (e.g., combined or furtherpartitioned) as the disclosure is not limited to the present examples.

As shown in FIG. 4, the sensor system 28 can include a camera 424, adepth sensor 426, and any other sensor, such as a speedometer, and a GPSsensor.

In this disclosure and its accompanying claims, the term “camera” shouldbe considered to cover many alternatives, such as a single camera ormultiple cameras. A single digital camera can have one image sensor ormultiple image sensors. A single camera can have no lenses, a singlelens, or multiple lenses. In some implementations, a single camera withmultiple lenses has a 360° view (or a nearly 360° view) at a same (orsimilar) height with no “blind spots” around AV 10.

Multiple cameras can have multiple lenses and/or multiple image sensors.In some cases, the multiple cameras can share a single lens and/or asingle image sensor. Multiple image sensors may have different exposuresettings. Multiple cameras may have different field of views. Multipleimage sensors may have different resolutions. Multiple image sensors maydetect different wavelengths of light. Of course, the multiple camerascan also have no lenses. In some implementations, the multiple camerashave a 360° view (or a nearly 360° view) at a same (or similar) heightwith no “blind spots” around AV 10.

Similarly, the term “depth sensor” should be considered to cover manyalternatives, such as one depth sensor, multiple cooperating depthsensors, or multiple independent depth sensors.

Light enters through the camera 424, and an image sensor of the camera424 converts the light into information of a digital, two-dimensionalimage. The camera 424 repeatedly captures two-dimensional images of itssurrounding environment. In many implementations, the camera 424 alsorecords timestamps at which the two-dimensional images are captured bythe image sensor.

The depth sensor 426 can be a radar sensor, a LIDAR sensor, a time offlight sensor, or other sensor for determining distances to objects inthe environment around AV 10. In some cases, depth sensor 426 can beimplemented with stereo cameras.

In an example in which depth sensor 426 is implemented with a LIDARsensor, the LIDAR sensor fires rapid pulses of laser light at surfacesof objects (e.g., vehicles, traffic lights) in the environmentsurrounding the AV 10. The LIDAR sensor measures the amount of time forthe laser to travel from the LIDAR sensor to the object, bounce off thesurface of the object, and return to the LIDAR sensor. Because lightmoves at a constant speed, the LIDAR sensor can calculate the distancebetween the surface of the object and the LIDAR sensor. The LIDAR sensorcan provide 360° coverage of laser light around the AV 10.

The system uses the depth sensor 426 to obtain fine details of geometryand position of the object. The vehicle signal detection system 74,specifically the tracker 427 as discussed below, can determine where theback, sides, and top of the object are, based on these details.

The vehicle signal detection system 74 can include, cooperate with, orinterface with a tracker 427, which can be implemented in a same orsimilar fashion as tracker 130 of FIG. 1 and tracker 230 of FIG. 2. Thetracker 427 tracks an object by assigning it an identification (e.g., aunique number). In one implementation, the tracker 427 begins trackingan object when it appears in consecutive frames from the camera 424. Thetracked object can be any real-world object. For the purposes of thepresent disclosure, the object is assumed to be an object (such as atraffic light or another vehicle) that has lights with standardmeanings. The tracker 427 may classify the tracked object as a vehicle,or not a vehicle. The tracker 427 may classify the tracked object as aspecific type of vehicle.

The tracker 427 determines a location of the tracked object relative tothe autonomous vehicle 10, using the depth sensor 426 and the camera424. In addition, the tracker 427 can determine changes in position,velocity, and acceleration, using the depth sensor 426 and the camera424. Thus, the vehicle signal detection system 74 can determine thepresence, location, classification, and/or path of objects and featuresof the environment of the autonomous vehicle 10, based on the changes inposition, velocity, and acceleration determined by the tracker 427. Thetracker 427 can form a polygon or box of a tracked object.

The vehicle signal detection system 74 detects light signals of objectswithin images from sensor system 28, using mechanisms illustrated inFIGS. 1 and 2. The vehicle signal detection system 74 can also analyzecharacteristics of the detected light signals, such as whether the lightsignals are brake lights, turn signals, reverse lights, emergencylights, or school bus lights. That is, the vehicle signal detectionsystem 74 can determine a signal state of another vehicle, such aswhether the other vehicle is braking, has its hazard lights on, has itsturn signal on, and so on.

The positioning system 76 processes sensor data from the sensor system28 along with other data to determine a position (e.g., a local positionrelative to a map, an exact position relative to a lane of a road, avehicle heading, etc.) and a velocity of the autonomous vehicle 10 (ofFIG. 3) relative to the environment.

The guidance system 78 processes sensor data from the sensor system 28along with other data to determine a path for the autonomous vehicle 10(of FIG. 3) to follow to a destination.

The vehicle control system 80 generates control signals for controllingthe autonomous vehicle 10 (of FIG. 3), e.g., according to the pathdetermined by the guidance system 78.

Thus, the vehicle signal detection system 74 can provide a moreefficient and less computationally intensive approach to finding vehiclesignals. In some cases, the vehicle signal detection system 74 canprovide a more robust approach to finding vehicle signals. Further, thesignal state determination by the vehicle signal detection system 74 canimprove tracking of other vehicles that are in proximity to the AV 10and, thus, control of the AV 10 by the vehicle control system 80.

Exemplary implementations of the vehicle signal detection system 74 areillustrated in FIGS. 1, 2, 5-10.

Vehicle Signal Detection System Architecture

FIG. 5 illustrates a workflow for multi-modal and/or multi-techniquevehicle signal detection in accordance with one implementation of thepresent disclosure. The workflow includes transforms 510, tracker 520,sensors 530, an object projector 540, a light detector 550 and/or asemantic classifier 560, and a vehicle signal (VS) fusion module 570.

Object projection leverages information relating the coordinate systemof the tracker 520 to the coordinate system of the camera, specifically,pixels of images captured by the camera. Generally, the physical orgeometric relationship between the coordinate system of the tracker 520is fixed with respect to the coordinate system of the camera (since thefield-of-view of camera is known and fixed within the coordinate systemof the tracker 520). Accordingly, transforms 510 can be predetermined,and can encode the relationship using one or more geometric matrixtransformations such that coordinates from the tracker 520 can betransformed into a pixel location in a two-dimensional image captured bythe camera.

The tracker 520 is the same as or similar to the previously-describedtracker 427 (as seen in FIG. 4).

The sensors 530 generally corresponds to sensor system 28 and includethe camera 424 and the depth sensor 426 (as seen in FIG. 4).

The object projector 540 receives the transforms 510, three-dimensionalcoordinates of the object from the tracker 520, and images (and,optionally, timestamps) from the sensors 530. The object projector 540projects a three-dimensional box representing the object, or a siderepresenting the object, onto a two-dimensional image captured by thecamera. The projection can generate a two-dimensional view of thethree-dimensional box or the side from the image captured by the camera.Based on the two-dimensional view, a region of interest in the imagesfrom the camera can be obtained. In addition, the object projector 540can in some cases provide occlusion reasoning in the images, relative toocclusions captured by the camera or sensed by the tracker 520. Further,the object projector 540 can crop the images captured by the camera downto the ROIs including the projections of the three-dimensional objectonto the images. In some cases, the object projector 540 can apply amask to the images to remove pixels not associated with a vehicle, orremove pixels not associated with likely regions where vehicle signallights would be present.

Although the object projector 540 might not be present in someimplementations of the disclosure, the object projector 540 can make thesearch for light signals within the images more efficient by reducingthe amount of the pixels to be searched to the cropped ROIs.

As detailed later, the light detector 550 and semantic classifier 560receive images from the camera of sensors 530 and the projected regionsof interest and occlusions from the object projector 540.

The light detector 550 can detect coordinates of a light within theimages provided by the camera using geometry. The light detector 550 cancrop a portion of the image, based on the ROI. One or more lightdetectors 550 can be implemented.

The semantic classifier 560 can determine a semantic meaning of thelights detected in the images received from the camera. One or moresemantic classifiers 560 can be implemented.

A VS fusion module 570 receives information regarding the detectedlights from the light detector 550 and/or the semantic classifier 560.

Further, the VS fusion module 570 receives locations of the lights fromthe light detector 550 and/or semantic meanings of the lights from thesemantic classifier 560.

The VS fusion module 570 include an interpretation module thatinterprets the locations from the light detector 550 and/or the semanticmeanings from the semantic classifier 560 to produce semanticallyidentified vehicle signals 580 within combined images.

Object Projector

In more detail, the object projector 540 receives the transforms 510 andthe coordinates of the three-dimensional tracked object from tracker520. The object projector 540 projects a three-dimensional boxrepresenting the tracked object into a two-dimensional image captured bya camera, based on the transforms 510 and coordinates of thethree-dimensional coordinates of the tracked object from tracker 520. Insome cases, the object projector 540 projects one or more vehicle sidesof the three-dimensional polygon, individually.

In one implementation, there is one object projector 540 per camera. Amore efficient implementation employs a plurality of cameras (possiblyincluding all cameras) sharing one object projector 540.

The object projector 540 can determine the projection of the object withcoarse and/or pixel-wise segmentation. This segmentation can delimitwhich part of the three-dimensional object is visible to the camera ofsensors 530.

Because the tracker 520 tracks information for multiple objects, theobject projector 540 can perform occlusion reasoning for the trackedobjects. For example, if the object projector 540 determines that afirst three-dimensional object is beyond a nearer, secondthree-dimensional object, the object projector 540 can determine that,e.g., half of the left side of the first three-dimensional object is notvisible to the camera.

Thus, the object projector 540 produces a two-dimensional view in theimage from the camera of sensors 530, where the object is expected toappear in the image. The two-dimensional view or a derivation of thetwo-dimensional view can form the ROI in the image. The object projector540 can also output the three-dimensional box representation of thetracked object to the light detector 550 and/or the semantic classifier560.

Light Detector: Geometric Observations

As discussed above, the system can include the light detector 550 and/orthe semantic classifier 560. The light detector 550 and semanticclassifier 560 can receive images from the camera of sensors 530 and theprojected regions of interest and occlusions from the object projector540. The light detector 550 and/or the semantic classifier 560 can alsoreceive from the object projector 540 the three-dimensional boxrepresentation of the tracked object. As discussed previously, the lightdetector 550 geometrically determines coordinates, size, shape, and/orperimeter of a light within images provided by the camera of sensors530. The semantic classifier 560 determines a semantic meaning of thelights detected in the images received from the camera 424. In someembodiments, the functionalities of the light detector 550 and thesemantic classifier 560 may be merged together.

In more detail, the light detector 550 is implemented as a single-framelight detector in some implementations. The light detector 550 receivesthe image from the camera of sensors 530, and, optionally, an ROI in theimage from the object projector 540. The light detector 550 can crop theimages down to the ROI and search within the ROI for one or more lights.This search can be performed by artificial intelligence, such as machinelearning, deep learning, a neural network, a convolutional neuralnetwork (e.g., a ConvNet), a recurrent neural network, random forests,genetic algorithms, and reinforcement learning. The light detector 550can be trained on images of positions of lights and appearances (e.g.,color, size), such as by models of lights on a three-dimensional box. Insome implementations, the artificial intelligence can be trained onregion and class labels of those images.

The light detector 550 processes fully or partially visible objectswithin the images. For example, the light detector 550 determines thatportions of an image are not part of a tracked object, based on theocclusions. For example, as discussed above, the object projector 540can determine that half of the left side of a first three-dimensionalobject is occluded from the camera of sensors 530, due to a secondthree-dimensional object. In this situation, the light detector 550 candetermine that any lights within the half of the left side of the firstthree-dimensional object do not belong to the first three-dimensionalobject.

Then, for an object, the light detector 550 computes geometricobservations about the lights. These geometric observations can includewhere the lights are objectively detected in the two-dimensional image,light descriptors (e.g., type, size, color, shape, perimeter, embeddedfeatures), and locations where the lights are situated on the vehicle.The light detector 550 can project a light onto a three-dimensionalrepresentation of the object, such as the three-dimensional box receivedfrom the object projector 540 or a unit cube.

FIG. 6 illustrates a vehicle represented as a three-dimensional box orpolygon 602 and a mapping of its lights. The vehicle is captured by acamera, and the image from the camera is represented by the image pixelspace 608.

The light detector 550 has information about the surfaces of thethree-dimensional box or polygon 602 as the sides of the vehicle, shownas vehicle sides 604. The three-dimensional box or polygon 602 and thesides of the vehicle can be provided by a tracker. Thus, although theobject can have more than four sides (not including the top and bottomin this example), the light detector 550 visualizes the object as havingonly four vehicle sides 604 in this example. Further, the light detector550 assumes exactly four corners for each side of the other vehicle.

In implementations with improved modeling, the number of corners perside can exceed four. Implementations also are possible in which thelight detector 550 can receive information that models the object ashaving more than four sides (again, not including the top, or bottom ofthe object). Further implementations are possible in which the objecthas six or more sides (including the top and bottom of the object, inthe case of flashing lights on a roof of a police car, or underglow, orground effects on the bottom of a customized car).

Geometrically, the light detector 550 can detect the lights 606 as shownin the image pixel space 608. When the tracker provides informationidentifying specific vehicle sides 604 of a vehicle to light detector550, detected lights can be associated to specific vehicle sides 604(and vice versa) in relative space 610 of FIG. 6.

As shown in the lights 606 of FIG. 6, the light detector 550 candetermine the location (e.g., pixel location), size, shape, averageintensity (e.g., in lumens), and/or color (e.g., white, yellow, red) ofthe lights located in the image pixel space 608. The information aboutthe coordinates of the three-dimensional box or polygon 602 and/orcoordinates of the vehicle sides 604 can provide regions of interest forthe light detector 550 to locate vehicle lights in the image pixel space608 efficiently.

The light detector 550 can apply image processing or signal processingtechniques to determine the location (e.g., pixel location), size,shape, average intensity (e.g., in lumens), and/or color (e.g., white,yellow, red) of the lights located in the image pixel space 608. In somecases, the light detector 550 can apply artificial intelligence todetermine the semantic meaning of the lights. This artificialintelligence can be implemented with techniques, such as machinelearning, deep learning, a neural network, a convolutional neuralnetwork (e.g., a ConvNet), a recurrent neural network, random forests,genetic algorithms, and reinforcement learning.

If desired, the light detector 550 can map the lights to coordinates ofa relative space 610 of a unit cube 612, as shown in FIG. 6. That is,the light detector 550 can regress light position information from theimage pixel space 608 to relative space 610. In the example illustratedin FIG. 6, the origin of the relative space 610 is the top-left cornerof the cube 612. Thus, one example of an output of the light detector550 could be “circular, 100 cm² amber light on object 12345 at [0.9,0.8, 0].”

Further, the light detector 550 can skip light detection in an areadetermined to be occluded by another object.

Semantic Classifier: Semantic Observations

Turning to a semantic light configuration implementation, the semanticclassifier 560 performs detection of a high-level light configuration onthe tracked object.

The semantic classifier 560 can be trained on images of lightconfigurations, such as “headlights on,” “left blinker on,” and “noocclusion.” Thus, the semantic classifier 560 can output, in oneexample, a feature vector symbolizing, “This car has its brakes and leftturn signals on, and the right side of the car is non-occluded.” Otherexemplary outputs of the semantic classifier 560 include “left blinkerON,” “brakes ON,” “left side occluded by other car,” “left side occludedby image border,” “lights on the right are ON while symmetrically placeslights on the left are OFF,” and “no occlusion.”

In an exemplary operation, the semantic classifier 560 receives theimage from the camera of sensors 530 and a region of interest in theimage from the object projector 540. The semantic classifier 560 alsocan crop patches of the image corresponding to, e.g., different vehiclesin the image. In some implementations, the semantic classifier 560 cantake occlusion into account when forming high-level light detections.

The semantic classifier 560 can apply image processing or signalprocessing techniques to determine semantic meaning of the lights. Insome cases, the semantic classifier 560 can apply artificialintelligence to determine the semantic meaning of the lights. Thisartificial intelligence can be implemented with techniques, such asmachine learning, deep learning, a neural network, a convolutionalneural network (e.g., a ConvNet), a recurrent neural network, randomforests, genetic algorithms, and reinforcement learning.

VS Fusion

Referring back to FIG. 5, the VS fusion module 570 compares and/or fusesmulti-technique interpretations, such as vehicle signal detectorsoperating on images from multiple cameras, as well as multipleobservations from the one or more light detectors 550 and/or one or moresemantic classifiers 560. Some multi-technique approaches can includereceiving information from multiple light detectors trained on differentdata sets and/or semantic classifiers trained on different data sets.Thus, the VS fusion module 570 can fuse multiple observations andsemantic messages from the semantic classifier 560 and geometricobservations from the light detector 550. The fusion of theseinterpretations can reduce false positives and add redundancy in asystem with a single point of failure, such as a single camera.

The VS fusion module 570 generally corresponds to FIG. 2. The VS fusionmodule 570 can be implemented as one fusion node or, for redundantsystems, several fusion nodes.

In some embodiments, the VS fusion module 570 can receive coordinates ofa light on a three-dimensional box from the light detector 550 orsemantic observations from the semantic classifier 560. The VS fusionmodule 570 receives, for the tracked object, images from differentcameras of sensors 530. The VS fusion module 570 then fuses lights frommultiple images onto a unit cube or three-dimensional box for geometricand temporal interpretation (for geometric observations) or embeddingsof semantic observation (for semantic observations).

The VS fusion module 570 can also apply temporal signal interpretationto a light to determine, e.g., whether the light is blinking. Forexample, the VS fusion module 570 can receive several images, at leastthree images of which show a same light in an illuminated state (e.g.,“ON”). The VS fusion module 570 can determine whether the light isblinking, based on timestamps of the several images.

In particular, if the light is blinking, rather than simply beingilluminated, the light will be illuminated with periodicity. Thus, theVS fusion module 570 can identify, e.g., five images including thelight. If the light is illuminated in the first, third, and fifthimages, chronologically, and is not illuminated in the second and fourthimages, chronologically, then the VS fusion module 570 can determine thetimestamps of the first, third, and fifth images. If the VS fusionmodule 570 determines the first timestamp differs from the thirdtimestamp by the same amount as the third timestamp differs from thefifth timestamp, then the VS fusion module 570 can determine that thelight is blinking at a periodicity approximately equal to the differencein, e.g., the first and third timestamps.

In some implementations, the VS fusion module 570 can determine afrequency of blinking or periodicity of blinking based on geometricobservations and/or semantic observations from a series of images.

VS Fusion: Geometric Observation Fusion

A split situation arises if the one camera of sensors 530 captures animage of a left side of a three-dimensional object (e.g., a vehiclerepresented by polygon 602) and another camera of sensors 530 capturesan image of a right side of the same three-dimensional object, as shownin FIG. 7. A similar situation can arise if one camera captures theimage of the left side and later captures the image of the right side.Techniques can be used to geometrically associate light detections onthe same vehicle, which is represented by the same polygon 602.

In the split situation illustrated in FIG. 7, a left light of the objectappears in the two-dimensional image 720 from camera 1, and a rightlight of the object appears in the two-dimensional image 722 from camera2.

The light detector 550 can detect the left light in the image fromcamera 1 and the right light in the image from camera 2. As discussedabove, the light detector 550 can detect the left light and associatethe left light to a polygon within a relative space (or athree-dimensional box). The same light detector 550 or a different lightdetector can detect the right light and associate the right light to thesame polygon. In some cases, the VS fusion module 570 can associatelight detections with a unit cube 702 or a three-dimensional box. Asillustrated in the example of FIG. 7, the VS fusion module 570 canreceive from the light detector 550 or regress the position informationof the left light onto a unit cube 702 with a specific object identifierand the position information of the right light onto the unit cube 702with the same object identifier.

The VS fusion module 570 can determine the images from cameras 1 and 2are of the same three-dimensional object, since the left light, and theright light are detected on the same unit cube 702. The VS fusion module570 can take other information into account, such as based on one ormore of: e.g., a tracking identifier of the three-dimensional object, aposition of the three-dimensional object, a polygon of thethree-dimensional object, or a timestamp of the images from cameras 1and 2.

The left light and the right light can be at a same or substantially thesame height (e.g., within 10 percent of the total height of the object)on the three-dimensional object.

In one example of a geometric observation fusion, the VS fusion module570 can re-project the two observations onto a three-dimensional modelto determine that, for example, two lights of the three-dimensionalobject are lit. Thus, the light detector 550 can establish a height of alight on a three-dimensional box representation of the three-dimensionalobject.

VS Fusion: Semantic Observation Fusion

Semantic information associated with a single light at a given point intime is insufficient to determine semantic meaning of vehicle signals.In one example of a semantic observation fusion, the VS fusion module570 receives information indicating “Left Blinker ON” and informationindicating “Right Blinker On”. In another example of a semanticobservation fusion, the VS fusion module 570 can receive informationindicating the image of the left side of a vehicle indicates “LeftBlinker ON” and “Right Side Occluded,” whereas the image of the rightside of the vehicle indicates “Right Blinker ON” and “Left SideOccluded.” This information can include the images themselves,three-dimensional coordinates from the light detector, or messagesincluding such semantic information from the semantic classifier.

Thus, the VS fusion module 570 generates semantic labels or anintermediary representation trained on semantic labels. The VS fusionmodule 570 then generates a final semantic label, such as “Hazards ONConfiguration,” from the two intermediary semantic representations. TheVS fusion module 570 provides this semantic label to, e.g., the vehiclecontrol system 80 to control the autonomous vehicle 10.

Thus, in one implementation, the VS fusion module 570 accumulatesobservations for the tracked object and for the camera 424. Then, for atracked object, the VS fusion module 570 can fuse the observations,interpret a static light layout (such as “these lights are together on avehicle”), and interpret a dynamic semantic (such as “flashinghazards”), based on certain logic for detecting flashing hazards.

The VS fusion module 570 thus semantically produces vehicle signals 580that can serve as a basis for controlling the vehicle control system 80to control the AV 10. For example, the AV 10 can be controlled to bebraked if a vehicle ahead of the autonomous vehicle 10 has its brakelights on. The AV 10 can also be steered if, for example, the semanticvehicle signals indicate an approaching emergency vehicle. Underappropriate circumstances, the autonomous vehicle 10 can also beaccelerated.

Alternative Design Using Pixel or Image Segmentation

Instead of doing occlusion reasoning using the tracker, imagesegmentation is used. In other words, ROIs, and occlusion reasoning aredetermined solely based on the images captured by the cameras.

In an alternative design, the occlusion reasoning assisted by tracker isremoved from the object projector 540 and instead receives pixel orimage segmentation. Such segmentation informs the system whether a lightis occluded or not. For example, the segmentation can indicate whether alight is on the three-dimensional object, whether the light is abackground light (e.g., on a second car in an image of a first car) oris a reflected light, and whether there is transparency in front of thelight (e.g., a traffic light viewed through the window of anothervehicle). The image segmentation can remove the background.

Summary of Selected Features

In one implementation, a body of the autonomous vehicle 10 (of FIG. 3)includes a plurality of cameras that record a plurality of images of anenvironment around the autonomous vehicle 10. The autonomous vehicle 10performs a target localization using radar or LIDAR sensors. Thus, theautonomous vehicle 10 can identify and track a location of an object inthree-dimensional space. The object projector 540 receives the identityof the tracked object and can project where and how thethree-dimensional object will enter the field-of-view of the pluralityof cameras and, thus, will appear in the images.

The object projector 540 can determine an ROI in the images receivedfrom the plurality of cameras. In one implementation, the light detector550 locates the lights within the ROI, rendering the processing of asemantic labelling potentially unnecessary.

The light detector 550 outputs the location of the lights to a VS fusionmodule 570. The semantic classifier 560 outputs semantic information oflights to the VS fusion module 570. The VS fusion module 570 can detectvehicle signals by fusing information from one or more light detectors550 and one or more semantic classifiers 560.

Thus, the autonomous vehicle 10 can be improved by detecting signals onnearby vehicles (e.g., no signal, brakes, turn signals (left/right),hazards, reversing lights, flashing yellow school bus lights(pre-boarding), flashing red school bus lights (boarding), emergencylights such as police, firetrucks, ambulances, and other vehicle lights,such as construction vehicles, garbage trucks, and meter maids).

Thus, the disclosed architecture can merge partial observations acrosscameras while allowing a different technique being applied to imagesfrom different cameras. In some implementations, the VS fusion module570 fuses observations from different camera/techniques into a lightconfiguration before a temporal interpretation.

Exemplary Vehicle Signal Fusion

FIG. 8 illustrates interpretation of detection results from a pluralityof vehicle signal detectors in accordance with various implementationsof the present disclosure. As described previously, the vehicle signaldetectors, such as first vehicle signal detector 245, second vehiclesignal detector 255, and Nth vehicle signal detector 265, can generateinformation, including one or more of: light detection results andsemantic information of lights. The light detection results and/orsemantic information of lights can be collected over time. Each vehiclesignal detector can generate a vector of information over time, e.g., astime-series data. For example, first vehicle signal detector 245 cangenerate vector 810 of information over time, second vehicle signaldetector 255 can generate vector 812 of information over time, and Nthvehicle signal detector 265 can generate vector 814 of information overtime. All the vectors collated together can, over time, form a matrix802 of information. The matrix 802 of information can be provided asinput to an interpretation module 275.

In some cases, the interpretation module 275 may apply filter(s) orfeature extraction schemes to the matrix 802 or a portion of matrix 802to transform the information into features that can assist in fusion ofinformation from a plurality of vehicle signal detectors. For instance,the interpretation module 275 may extract frequency information.

In some cases, the interpretation module 275 can implement explicitlogic rules for interpreting the information in matrix 802. In somecases, the interpretation module 275 may implement artificialintelligence techniques such as decision trees, probabilisticclassifiers, naive Bayes classifiers, support vector machine, deeplearning, a neural network, a convolutional neural network, a recurrentneural network, random forests, genetic algorithms, and/or reinforcementlearning. The interpretation module 275 can implement supervised orunsupervised learning.

Exemplary Methods and Modifications

FIG. 9 illustrates a method for multi-modal object projection inaccordance with various implementations of the present disclosure. In902, an object projector receives a two-dimensional image of a firstthree-dimensional object from a camera on a body. In 904, the objectprojector receives a first signal indicating first three-dimensionalcoordinates of the first three-dimensional object. In 904, the objectprojector projects the first three-dimensional coordinates onto thetwo-dimensional image. In 906, the object projector determines a regionof interest in the two-dimensional image based on the projection. In908, a light detector, or a semantic classifier searches for a light ofthe three-dimensional object within the region of interest. The methodis illustrated by the description of FIGS. 1, 5, and 6,

FIG. 10 illustrates a method for multi-technique vehicle signaldetection in accordance with various implementations of the presentdisclosure. In 1002, a vehicle signal detector searches for a firstlight of the three-dimensional object within the first two-dimensionalimage. In 1004, the vehicle signal detector searches for a second lightwithin the second two-dimensional image. The first two-dimensional imageand the second two-dimensional image can be different images. The firsttwo-dimensional image and the second two-dimensional image can be thesame image. Techniques used to search for the first light and the secondlight can be different. Techniques used to search for the first lightand the second light can be the same. In 1006, the vehicle signaldetector determines that the second light belongs to thethree-dimensional object. In 1008, the vehicle signal detectordetermines a semantic meaning of the first light and the second light.The method is illustrated by the description of FIGS. 2, 5, 7, and 8.

Implementations of the present disclosure are directed to searching forvehicle signals. The teachings of the disclosure are not so limited, andthose teachings can also be applied to searching for emergency signalson emergency vehicles, traffic lights, and any other light encounteredwhile driving.

Once the vehicle signal(s) is detected, the vehicle control system 80can take action, based on detected vehicle signal(s). For example, thevehicle control system 80 can control the AV 10 to brake, if thethree-dimensional object is ahead of the AV and the three-dimensionalobject's brake lights are on. The vehicle control system 80 can steerthe AV 10 to the side of a road, if the vehicle signals of an emergencyvehicle ahead of the autonomous vehicle 10 is detected. The vehiclecontrol system 80 can accelerate the AV 10, if the three-dimensionalobject's headlights are approaching perpendicular to the body of theautonomous vehicle 10 and is thus about to strike the autonomous vehicle10.

Computing System

FIG. 11 illustrates components of a computing system used inimplementations described herein. Specifically, the components of FIG.10 can be present in the AV 10.

System 1100 can be implemented within one computing device ordistributed across multiple computing devices or sub-systems thatcooperate in executing program instructions. In some implementations,the system 1100 can include one or more blade server devices, standaloneserver devices, personal computers, routers, hubs, switches, bridges,firewall devices, intrusion detection devices, mainframe computers,network-attached storage devices, smartphones and other mobiletelephones, and other types of computing devices. The system hardwarecan be configured according to any computer architecture, such as aSymmetric Multi-Processing (SMP) architecture or a Non-Uniform MemoryAccess (NUMA) architecture.

The system 1100 can include one or more central processors 1110, whichcan include one or more hardware processors and/or other circuitry thatretrieve and execute software 1120 from the storage 1130. The one ormore central processors 1110 can be implemented within one processingdevice, chip, or package, and can also be distributed across multipleprocessing devices, chips, packages, or sub-systems that cooperate inexecuting program instructions. In one implementation, the system 1100includes the one or more central processors 1110 and a GPU 1150. The GPU1150 can benefit the visual/image processing in the computing system1100. The GPU 1150, or any second-order processing element independentfrom the one or more central processors 1110 dedicated to processingimagery and other perception data in real or near real-time, can providea significant benefit.

The storage 1130 can include one or more computer-readable storage mediareadable by the one or more central processors 1110 and that storesoftware 1120. The storage 1130 can be implemented as one storage deviceand can also be implemented across multiple co-located or distributedstorage devices or sub-systems. The storage 1130 can include additionalelements, such as a controller, that communicate with the one or morecentral processors 1110. The storage 1130 can also include storagedevices and/or sub-systems on which data and/or instructions are stored.The system 1100 can access one or more storage resources to accessinformation to carry out any of the processes indicated by the software1120.

The software 1120, including routines for at least partially performingat least one of the processes illustrated in the FIGURES, can beimplemented in program instructions. Further, the software 1120, whenexecuted by the system 1100 in general or the one or more centralprocessors 1110, can direct, among other functions, the system 1100, orthe one or more central processors 1110 to perform the vehicle signaldetection as described herein.

In implementations where the system 1100 includes multiple computingdevices, a server of the system or, in a serverless implementation, apeer can use one or more communications networks that facilitatecommunication among the computing devices. For example, the one or morecommunications networks can include or be a local or wide area networkthat facilitate communication among the computing devices. One or moredirect communication links can be included between the computingdevices. In addition, in some cases, the computing devices can beinstalled at geographically distributed locations. In other cases, themultiple computing devices can be installed at one geographic location,such as a server farm or an office.

The system 1100 can include a communications interface 1040 thatprovides one or more communication connections and/or one or moredevices that allow for communication between the system 1100 and othercomputing systems (not shown) over a communication network or collectionof networks (not shown) or the air.

As described herein, one aspect of the present technology is thegathering and use of data available from various sources to improvequality and experience. The present disclosure contemplates that in someinstances, this gathered data may include personal information. Thepresent disclosure contemplates that the entities involved with suchpersonal information respect and value privacy policies and practices.

As used herein, the terms “storage media,” “computer-readable storagemedia,” or “computer-readable storage medium” can refer tonon-transitory storage media, such as a hard drive, a memory chip, andcache memory, and to transitory storage media, such as carrier waves, orpropagating signals.

Aspects of the perception system for an autonomous vehicle can beembodied in various manners (e.g., as a method, a system, a computerprogram product, or one or more computer-readable storage media).Accordingly, aspects of the present disclosure can take the form of ahardware implementation, a software implementation (including firmware,resident software, or micro-code) or an implementation combiningsoftware and hardware aspects that can generally be referred to hereinas a “circuit,” “module” or “system.” Functions described in thisdisclosure can be implemented as an algorithm executed by one or morehardware processing units, e.g., one or more microprocessors of one ormore computers. In various embodiments, different operations, andportions of the operations of the methods described can be performed bydifferent processing units. Furthermore, aspects of the presentdisclosure can take the form of a computer program product embodied inone or more computer-readable media having computer-readable programcode embodied, e.g., encoded, or stored, thereon. In variousimplementations, such a computer program can, for example, be downloaded(or updated) to existing devices and systems or be stored uponmanufacturing of these devices and systems.

The detailed description presents various descriptions of specificimplementations. The innovations described can be implemented in amultitude of different ways, for example, as defined and covered by theclaims and/or select examples. In the description, reference is made tothe drawings where like reference numerals can indicate identical orfunctionally similar elements. Elements illustrated in the drawings arenot necessarily drawn to scale. Additionally, certain embodiments caninclude more elements than illustrated in a drawing and/or a subset ofthe elements illustrated in a drawing. Further, some embodiments canincorporate a suitable combination of features from two or moredrawings.

The disclosure describes various illustrative implementations andexamples for implementing the features and functionality of the presentdisclosure. While components, arrangements, and/or features aredescribed in connection with various example implementations, these aremerely examples to simplify the present disclosure and are not intendedto be limiting. In the development of any actual implementation,numerous implementation-specific decisions can be made to achieve thedeveloper's specific goals, including compliance with system, business,and/or legal constraints, which can vary from one implementation toanother. Additionally, while such a development effort might be complexand time-consuming, it would be a routine undertaking for those ofordinary skill in the art having the benefit of this disclosure.

In the Specification, reference was made to the spatial relationshipsbetween various components and to the spatial orientation of variousaspects of components as depicted in the attached drawings. The devices,components, members, and apparatuses described can be positioned in anyorientation. Thus, the use of terms such as “above”, “below”, “upper”,“lower”, “top”, “bottom”, or other similar terms to describe a spatialrelationship between various components or to describe the spatialorientation of aspects of such components, describes a relativerelationship between the components or a spatial orientation of aspectsof such components, respectively, as the components described can beoriented in any direction. When used to describe a range of dimensionsor other characteristics (e.g., time, pressure, temperature, length,width, etc.) of an element, operations, and/or conditions, the phrase“between X and Y” represents a range that includes X and Y. The systems,methods, and devices of this disclosure have several innovative aspects,no one of which is solely responsible for the attributes disclosedherein. Some objects or advantages might not be achieved byimplementations described herein. Thus, for example, certainimplementations can operate in a manner that achieves or optimizes oneadvantage or group of advantages as taught herein, and not other objectsor advantages as may be taught or suggested herein.

In one example implementation, any number of electrical circuits of theFIGS. can be implemented on a board of an associated electronic device.The board can be a general circuit board that can hold variouscomponents of the internal electronic system of the electronic deviceand, further, provide connectors for other peripherals. Morespecifically, the board can provide the electrical connections by whichother components of the system can communicate electrically. Anyprocessors (inclusive of digital signal processors, microprocessors,supporting chipsets, etc.) and computer-readable non-transitory memoryelements can be coupled to the board based on configurations, processingdemands, computer designs, etc. Other components such as externalstorage, additional sensors, controllers for audio/video display, andperipheral devices can be attached to the board as plug-in cards, viacables, or integrated into the board itself. In various implementations,the functionalities described herein can be implemented in emulationform as software or firmware running within one or more configurable(e.g., programmable) elements arranged in a structure that supportsthese functions. The software or firmware providing the emulation can beprovided on one or more non-transitory, computer-readable storage mediaincluding instructions to allow one or more processors to carry outthose functionalities.

In another example implementation, the electrical circuits of the FIGS.can be implemented as standalone modules (e.g., a device with associatedcomponents and circuitry configured to perform a specific application orfunction) or implemented as plug-in modules into application specifichardware of electronic devices. Implementations of the presentdisclosure can be readily included in a system-on-chip (SOC) package. AnSOC represents an integrated circuit (IC) that integrates components ofa computer or other electronic system into one chip. The SOC can containdigital, analog, mixed-signal, and often radio frequency functions thatcan be provided on one chip substrate. Other embodiments can include amulti-chip-module (MCM), with a plurality of separate ICs located withinone electronic package and that interact through the electronic package.In various other implementations, the processors can be implemented inone or more silicon cores in Application Specific Integrated Circuits(ASICs), Field Programmable Gate Arrays (FPGAs), and other semiconductorchips.

The specifications, dimensions, and relationships outlined herein (e.g.,the number of processors and logic operations) have been offered fornon-limiting purposes of example and teaching. Such information can bevaried considerably. For example, various modifications, and changes canbe made to arrangements of components. The description and drawings are,accordingly, to be regarded in an illustrative sense, not in arestrictive sense.

With the numerous examples provided herein, interaction was described interms of two, three, four, or more electrical components for purposes ofclarity and example. The system can be consolidated in any manner. Alongsimilar design alternatives, the illustrated components, modules, andelements of the FIGS. can be combined in various possible configurationsthat are clearly within the scope of this disclosure. In certain cases,it might be easier to describe one or more of the functionalities of agiven set of flows by referencing a limited number of electricalelements. The electrical circuits of the FIGS. and their teachings arereadily scalable and can accommodate many components, as well as morecomplicated/sophisticated arrangements and configurations. Accordingly,the examples provided do not limit the scope or inhibit the teachings ofthe electrical circuits as potentially applied to a myriad of otherarchitectures.

In this disclosure, references to various features (e.g., elements,structures, modules, components, steps, operations, characteristics,etc.) included in “one implementation”, “example implementation”, “animplementation”, “another implementation”, “some implementations”,“various implementations”, “other implementations”, “alternativeimplementation”, and the like are intended to mean that any suchfeatures are included in one or more implementations of the presentdisclosure and might or might not necessarily be combined in the sameimplementations.

The functions related to multi-modal and/or multi-technique vehiclesignal detection, e.g. those summarized in the one or more processesshown in the FIGURES, illustrate some of the possible functions that canbe executed by, or within, the autonomous vehicle illustrated in theFIGS. Some of these operations can be deleted or omitted whereappropriate, or these operations can be modified or changedconsiderably. In addition, the timing of these operations can be alteredconsiderably. The preceding operational flows have been offered forpurposes of example and discussion. Implementations described hereinprovide flexibility in that any suitable arrangements, chronologies,configurations, and timing mechanisms can be provided.

SELECTED EXAMPLES

Example 1 is a method comprising: receiving a two-dimensional image of afirst three-dimensional object from a camera on a body; receiving afirst signal indicating first three-dimensional coordinates of the firstthree-dimensional object; projecting the first three-dimensionalcoordinates onto the two-dimensional image; determining a region ofinterest in the two-dimensional image based on the projection; andsearching for a light of the three-dimensional object within the regionof interest.

In Example 2, the method of Example 1 can optionally include projectingthe first three-dimensional coordinates onto the two-dimensional imagecomprising applying a matrix transformation to the three-dimensionalcoordinates.

In Example 3, the method of Example 1 or 2 can optionally includecontrolling a vehicle, based on the light of the three-dimensionalobject, wherein the controlling is accelerating, braking, or steeringthe vehicle.

In Example 4, the method of any one of Examples 1-3 can optionallyinclude: receiving a second signal indicating second three-dimensionalcoordinates of a second three-dimensional object; and determining aportion of the first three-dimensional object is occluded in thetwo-dimensional image by the second three-dimensional object, based onthe first three-dimensional coordinates of the first three-dimensionalobject, and the second three-dimensional coordinates of the secondthree-dimensional object.

In Example 5, the method of claim any one of Examples 1-4 can optionallyinclude projecting the first three-dimensional coordinates comprising:projecting a surface of a three-dimensional polygon corresponding to thefirst three-dimensional object.

In Example 6, the method of any one of Examples 1-5 can optionallyinclude: the signal indicating the three-dimensional coordinates of thefirst three-dimensional object including three-dimensional coordinatesof a plurality of surfaces of the first three-dimensional object, andthe region of interest being determined based on the three-dimensionalcoordinates of the plurality of surfaces of the first three-dimensionalobject.

In Example 7, the method of any one of Examples 1-6 can optionallyinclude: cropping the two-dimensional image to the region of interest toproduce a cropped image, wherein the searching is performed on thecropped image.

Example 8 is a system comprising: a memory including instructions, aprocessor to execute the instructions, and an object projector encodedin the instructions to: receive a two-dimensional image of a firstthree-dimensional object from a camera on a body; receive a first signalindicating first three-dimensional coordinates of the firstthree-dimensional object; project the first three-dimensionalcoordinates onto the two-dimensional image; determine a region ofinterest in the two-dimensional image based on the projection; andsearch fora light of the three-dimensional object within the region ofinterest.

In Example 9, the system of Example 8 can optionally include projectingthe first three-dimensional coordinates onto the two-dimensional imagecomprising applying a matrix transformation to the three-dimensionalcoordinates.

In Example 10, the system of Example 8 or 9 can optionally include avehicle control system to accelerate, brake, or steer a vehicle, basedon the light of the three-dimensional object.

Example 11 is a method comprising: searching for a first light of athree-dimensional object within a first two-dimensional image of thethree-dimensional object captured by a camera on a vehicle; searchingfor a second light within a second two-dimensional image; determiningthat the second light belongs to the three-dimensional object; anddetermining a semantic meaning of the first light and the second light.

In Example 12, the method of Example 11 can optionally includecontrolling the vehicle, based on the semantic meaning of the firstlight and the second light, wherein the controlling is accelerating,braking, or steering the vehicle.

In Example 13, the method of Example 11 or 12 can optionally include:determining a first location of the first light on the three-dimensionalobject and a first color of the first light in the first two-dimensionalimage, wherein the semantic meaning is based on the first location andthe first color.

In Example 14, the method of Example 13 can optionally include:projecting the first light onto a first represented location of thethree-dimensional representation of the three-dimensional object, basedon the first location; determining a second location of the second lighton the three-dimensional object and a second color of the second lightin the second two-dimensional image; and projecting the second lightonto a second represented location of the three-dimensionalrepresentation of the three-dimensional object, based on the secondlocation, wherein the semantic meaning is based on the first representedlocation, the second represented location, the first color, and thesecond color.

In Example 15, the method of any one of Examples 11-14 can optionallyinclude: determining a first semantic meaning of the first light; anddetermining a second semantic meaning of the second light, wherein thesemantic meaning is based on the first semantic meaning and the secondsemantic meaning.

In Example 16, the method of any one of Examples 11-15 can optionallyinclude: receiving a first timestamp of a time at which the firsttwo-dimensional image was captured by the camera; receiving a secondtimestamp of a time at which the second two-dimensional image wascaptured; and determining that a light of the three-dimensional objectis blinking, based on the first timestamp and the second timestamp,wherein the semantic meaning is based on the determining that the lightis blinking.

In Example 17, the method of any one of Examples 11-16 can optionallyinclude the semantic meaning being based on a first color of the firstlight in the first two-dimensional image and a second color of thesecond light in the second two-dimensional image.

Example A is a vehicle comprising: a memory including instructions; aprocessor to execute the instructions; a body including a camera; and avehicle signal detector encoded in the instructions to: implement anyone or more of Examples 1-7, 11-17, and methods described herein.

Example B is one or more non-transitory, computer-readable media encodedwith instructions that, when executed by one or more processing units,perform a method according to any one or more of Examples 1-7, 11-17,and methods described herein.

Example C is an apparatus that includes means to implement and/or carryout any one of the Examples herein.

Various implementations can combine features of Examples herein.

What is claimed is:
 1. A method, comprising: receiving a two-dimensionalimage of a vehicle from a camera; receiving three-dimensionalcoordinates corresponding to sides of a vehicle; projecting thethree-dimensional coordinates onto the two-dimensional image;determining a region of interest in the two-dimensional image based onthe projection; searching for a vehicle signal light of the vehiclewithin the region of interest; and associating the vehicle signal lightfound within the region of interest to a particular side of a vehicle.2. The method of claim 1, wherein projecting the three-dimensionalcoordinates onto the two-dimensional image comprises applying a matrixtransformation to the three-dimensional coordinates.
 3. The method ofclaim 1, further comprising: controlling a vehicle, based on the vehiclesignal light of the vehicle, wherein the controlling is accelerating,braking, or steering the vehicle.
 4. The method of claim 1, furthercomprising: determining a portion of the vehicle is occluded in thetwo-dimensional image; and in response to determining the portion of thevehicle is occluded in the two-dimensional image, applying a mask to thetwo-dimensional image to remove pixels not associated with regions wherevehicle signal lights would be present.
 5. The method of claim 1,wherein projecting the three-dimensional coordinates comprises:projecting the three-dimensional coordinates corresponding to a frontface of the vehicle onto the two-dimensional image.
 6. The method ofclaim 1, wherein projecting the three-dimensional coordinates comprises:projecting the three-dimensional coordinates corresponding to a rearface of the vehicle onto the two-dimensional image.
 7. The method ofclaim 1, further comprising: cropping the two-dimensional image to theregion of interest to produce a cropped image, wherein the searching isperformed on the cropped image.
 8. One or more non-transitory,computer-readable media encoded with instructions that, when executed byone or more processing units, perform a method comprising: receiving atwo-dimensional image of a vehicle from a camera; receivingthree-dimensional coordinates corresponding to sides of a vehicle;projecting three-dimensional coordinates of a first side of the vehicleonto the two-dimensional image; determining a first region of interestin the two-dimensional image based on the projection; and searching fora vehicle signal light of the vehicle within the first region ofinterest.
 9. The one or more non-transitory, computer-readable media ofclaim 8, further comprising: associating the vehicle signal light in thefirst region of interest to the first side of the vehicle.
 10. The oneor more non-transitory, computer-readable media of claim 8, whereinprojecting the three-dimensional coordinates of the first side onto thetwo-dimensional image comprises applying a matrix transformation to thethree-dimensional coordinates.
 11. The one or more non-transitory,computer-readable media of claim 8, the method further comprising:determining a portion of the vehicle is occluded in the two-dimensionalimage, and in response to determining the portion of the vehicle isoccluded in the two-dimensional image, applying a mask to thetwo-dimensional image to remove pixels not associated with vehicle. 12.The one or more non-transitory, computer-readable media of claim 8,further comprising: projecting three-dimensional coordinates of a secondside of the vehicle onto the two-dimensional image; determining a secondregion of interest in the two-dimensional image based on the projection;and searching for a further vehicle signal light of the vehicle withinthe second region of interest.
 13. The one or more non-transitory,computer-readable media of claim 12, further comprising: associating thevehicle signal light in the first region of interest to the second sideof the vehicle.
 14. The one or more non-transitory, computer-readablemedia of claim 8, wherein the first side corresponds to a front face orrear face of a vehicle.
 15. A system, comprising: one or more memoriesincluding instructions; one or more processors to execute theinstructions; and an object projector, encoded in the instructions, to:receive a two-dimensional image of a vehicle from a camera; receivethree-dimensional coordinates corresponding to different sides of avehicle; project three-dimensional coordinates of a selected side ontothe two-dimensional image; and determine a region of interest in thetwo-dimensional image based on the projection; and a vehicle signaldetector, encoded in the instructions, to: search for a vehicle signallight of the vehicle within the region of interest; and associate thevehicle signal light found within the region of interest to the selectedside of a vehicle.
 16. The system of claim 15, wherein the system is anautonomous vehicle.
 17. The system of claim 15, wherein the selectedside is a front side of the vehicle.
 18. The system of claim 15, whereinthe selected side is a rear side of the vehicle.
 19. The system of claim16, wherein the object projector is further to: projectthree-dimensional coordinates of a further selected side onto thetwo-dimensional image; and determine a further region of interest in thetwo-dimensional image based on the projection.
 20. The system of claim19, wherein the vehicle signal detector is further to: search for afurther vehicle signal light of the vehicle within the region ofinterest; and associate the further vehicle signal light found withinthe region of interest to the further selected side of the vehicle.